Skip to content

TST: Ensure dtypes are set correctly for empty integer columns #24386 #34886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2020

Conversation

avinashpancham
Copy link
Contributor

@avinashpancham avinashpancham commented Jun 20, 2020

@charlesdong1991 charlesdong1991 added the Testing pandas testing functions or related to the test suite label Jun 20, 2020
Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! one nitpick

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

I would maybe move the test to tests/frame/test_constructors.py



def test_check_dtype_empty_column():
# GH24386: Ensure dtypes are set correctly for empty integer columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add to the comment "dictionary data with non-overlapping columns, resulting in an empty DataFrame", as it is specifically this case (not just an empty dataframe)

@avinashpancham
Copy link
Contributor Author

avinashpancham commented Jun 20, 2020

Thanks. Can I add the test at the bottom of the class TestDataFrameConstructors in tests/frame/test_constructors.py or should I add it in a specific section of the class?

@jreback jreback added Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions labels Jun 20, 2020

def test_check_dtype_empty_column():
# GH24386: Ensure dtypes are set correctly for empty integer columns
data = pd.DataFrame({"a": [1, 2]}, columns=["b"], dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls parameterize over the dtypes mentioned in the OP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in test_constructors.py there are already similar tests, please move near them

@jreback jreback added this to the 1.1 milestone Jun 20, 2020
("timedelta64[ns]", np.dtype("<m8[ns]")),
("bool", np.dtype("bool")),
("object", np.dtype("O")),
("category", CategoricalDtype(categories=[], ordered=False)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great as long as you are adding a list, can you add the rest of the types. you ca easily do this by using the types defined in pandas._testing (we also have fixtures for this, but maybe not one that covers all types); would be ok with one for that as well. Note that I don't think we have Boolean, String defined pandas._testing (could add them there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more dtypes check, but now the list is very long. Any ideas on how to makes more pretty? I did not want to add it to the lists in pandas._testing, since I was not sure it would be compatible with other tests that rely on them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do it like this, rather use the already defined ones that are ther
e.g. ALL_EA_INT_DTYPES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any tips on how to refactor the parametrization if I use ALL_EA_INT_DTYPES? The parametrization now has the form [(input_value, expected_value)], but if I loop over ALL_EA_INT_DTYPES I think I still need to manually provide the expected value every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how's that? the expected IS the same as the input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, you can easily tranform these:

In [1]: pd.Int64Dtype().name                                                                                                                                                                                        
Out[1]: 'Int64'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so my points is that you can simply pass in the ALL_* (just concatenate them all), and check the dtypes of the result columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that already helps me a lot. But I for int this would not work since the expected is 'Int64'. Similarly for float, str.

Sorry for the large number of questions, this is my first PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure so what you can do is break this into 2 tests, one for the easy cases where this is true and one for the special cases (where you can input 2 args for input and expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I think I came to a much better test now.

@jreback jreback merged commit 8020543 into pandas-dev:master Jun 20, 2020
@jreback
Copy link
Contributor

jreback commented Jun 20, 2020

thanks @avinashpancham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Dtype Conversions Unexpected or buggy dtype conversions Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame constructor ignores integer dtype when dict-data and non-overlapping columns
4 participants